support TORCH_CUDA_ARCH_LIST and avoid link against libcuda.so at compile time by winggan · Pull Request #245 · thu-ml/SageAttention

winggan · 2025-08-24T07:37:12Z

support TORCH_CUDA_ARCH_LIST
support TORCH_CUDA_ARCH_LIST so we can compile wheel in a non-GPU environment, making it more convenient to working with CI/CD and prebuilt binary distirbution

now we can build SageAttention in a cpu-only environment using (as an example):

CUDA_HOME=/path/to/cuda-x.y TORCH_CUDA_ARCH_LIST='8.0;9.0+PTX' python3 setup.py bdist_wheel

avoid link against libcuda.so
libcuda.so (the driver API) is not available in a non-GPU environment, so we should avoid directly link against it at compile time.
NVIDIA has offered a standard way to access driver API via cudaGetDriverEntryPointByVersion (or previously cudaGetDriverEntryPoint) to dynamically load and call driver API

support TORCH_CUDA_ARCH_LIST so we can compile wheel in a non-GPU environment, making it more convenient to working with CI/CD and prebuilt binary distirbution

…ver API is loaded successfully

firofame · 2025-10-17T02:38:26Z

@winggan Thanks man it worked!! can you keep your branch updated please

for anyone who wants try this PR:
.run_commands('TORCH_CUDA_ARCH_LIST="8.9" pip install --use-pep517 --no-build-isolation git+https://github.com/winggan/SageAttention.git@patch-1')

winggan · 2025-11-26T13:09:14Z

@winggan Thanks man it worked!! can you keep your branch updated please

for anyone who wants try this PR: .run_commands('TORCH_CUDA_ARCH_LIST="8.9" pip install --use-pep517 --no-build-isolation git+https://github.com/winggan/SageAttention.git@patch-1')

it has been a while since I created this PR which should cover most compile time issue at that time.

could you please explain briefly what has been updated by now and how can I help you, because I haven't been keeping up with the project updates for a while due to work commitments.

Update setup.py to support TORCH_CUDA_ARCH_LIST

9359733

support TORCH_CUDA_ARCH_LIST so we can compile wheel in a non-GPU environment, making it more convenient to working with CI/CD and prebuilt binary distirbution

winggan mentioned this pull request Aug 24, 2025

Fix setup.py for machines without GPUs #239

Open

winggan added 5 commits August 24, 2025 23:46

Update qk_int_sv_f8_cuda_sm90.cu for dynamically loading cuda driver API

91500fc

Update attn_cuda_sm90.h expose is_available() API so caller knows dri…

a6fa90b

…ver API is loaded successfully

Update pybind_sm90.cpp expose is_available()

bbe99aa

Update core.py

a5e118c

Update qk_int_sv_f8_cuda_sm90.cu calling the dynamically loaded API

fed68c7

winggan changed the title ~~Update setup.py to support TORCH_CUDA_ARCH_LIST~~ support TORCH_CUDA_ARCH_LIST and avoid link against libcuda.so at compile time Aug 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support TORCH_CUDA_ARCH_LIST and avoid link against libcuda.so at compile time#245

support TORCH_CUDA_ARCH_LIST and avoid link against libcuda.so at compile time#245
winggan wants to merge 6 commits intothu-ml:mainfrom
winggan:patch-1

winggan commented Aug 24, 2025 •

edited

Loading

Uh oh!

firofame commented Oct 17, 2025

Uh oh!

winggan commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

winggan commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

firofame commented Oct 17, 2025

Uh oh!

winggan commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

winggan commented Aug 24, 2025 •

edited

Loading